Hero Replacement: Determining the Best Alternatives
for a Banned Hero in DotA 2

Data Mining and Wrangling Final Project

Submitted by Learning Team 8 :
Justine Buno
Aries de Guzman
Ray Franco Rivera
Paul Michael Uy

This version of the notebook has been edited to exclude the source code. You may contact the researchers if you wish to see the source code.

1. Executive Summary

The rise of e-Sports and DotA 2 has paved way to a brand new industry -- the gaming industry. With prize pools reaching to at most $30 million dollars, winning a game of DotA 2 becomes more and more important. One of the most crucial factors in winning a DotA 2 game is drafting, wherein a team picks five (5) heroes and bans six (6) heroes from the opposing team to disrupt their strategy.

In the event that a key hero is banned, what are the best alternative heroes to pick based on their game impact?

To help the players of DotA 2 answer this problem, the researchers looked into clustering heroes based on their game impact using the data scraped from OpenDota's API. The data consists of premium matches starting from August 27, 2018 to June 16, 2019 amounting to 15260 hero observations with 14 features.

K-Means algorithm was used in performing the clustering as it is useful in clustering continuous data with few features. Once clustered, similar heroes to the banned key hero can be retrieved from a specific role/cluster that a team wants it to be played, using Euclidean distance. Results of the analysis showed that heroes can be clustered into three (3) categories; Core, Utility, and Support. Based on the identified top features per cluster, Core heroes focus on dealing damage throughout the game. Utility heroes are focused on being versatile throughout the game, able to be played as a core and support depending on the situation. Lastly, Support heroes focus on maximizing the whole team's advantage by providing support to cores and buying items that contribute to the overall good of the team.

K-Means clustering was successful in identifying three (3) categories as it is consistent with the ground truth of hero categories. Additionally, it was found out that certain heroes can be played in multiple roles, depending on the game strategy. This allows teams to draft with ambiguity, which can be leveraged as an advantage against an opposing team. Lastly, identifying a similar hero based on a certain cluster is useful in minimizing the deviation of a team from its original strategy.

For future study, the researchers recommend to further optimize the data by separating win and lose games. It is also recommended that the data be focused on a specific DotA 2 patch and should be continuously updated to maintain relevance to the current metagame.

2. Introduction

In every sport, it is the goal of every team to win. The most important factor in winning a DotA2 game is to properly execute a team’s game strategy by drafting the right heroes. However, in professional games, each team is only allowed to pick five (5) heroes for their team and ban six (6) heroes to hamper the opposing team’s strategy. By banning heroes, teams are forced to make adjustments to their strategy that may cause their defeat. The objective of this study is to minimize the deviation to the team’s overall strategy when a key hero is banned by recommending the most similar hero based on their game impact.

2.1 DotA 2 Basics

DotA2 is a team-oriented game pitting two teams of five players against each other. A game is won by destroying the enemy's Ancient building before they destroy yours. The Ancient is the largest building and is centrally located in each team's base. The teams are often referred to as the Radiant and the Dire (see Figure 1).

Figure 1. Radiant vs Dire

The goal of each team is to spend time gaining resources such as experience and gold while limiting and reducing the opposing team's resources. The team with greater resources will have a bigger advantage, enabling them to destroy important objectives and eventually, the enemy's Ancient.

At the start of each game, each player controls a unique hero out of a pool of 117 heroes. A typical professional match starts with each team's captain banning six (6) heroes and drafting five (5) heroes alternatively (see Figure 2).

Figure 2. Captains Mode picking order

A captain drafts certain heroes according to their strategy and bans heroes that may disrupt their strategy, or be of synergy to the opposing team's strategy.

2.2 Objective of the Study

The objective of this study is to cluster heroes based on their game impact and identify alternative heroes most similar to a banned hero in the event that an opposing team disrupts a team's key strategy by banning their key hero.

2.3 Significance of the Study

The e-Sports industry is currently a multi-million-dollar industry that is poised to reach new heights this year. According to recent studies, the total revenue for the e-Sports market is projected to hit $1.1 billion dollars this year -- 26.7% higher than the previous year.

Due to high tournament prize pools with amounts reaching to at most $30 million dollars, traditional sports teams have been investing on e-Sports teams to get their start in the industry.

For every competitive sport, it is obvious that winning is crucial. DotA 2 is not any different. A big determinant for winning DotA 2 games is how a team drafts to execute their game strategy.

2.4 Scope and Limitation

This study focuses only on the premium matches in the recent 2019 DotA Pro Circuit regardless of the game result. This means that all games, whether resulted to a win or lose, are accounted. Another dimension that was not explored in the analysis is the duration of the game. The duration of the game may have an effect on the distribution of values for each hero observation.

Back to Table of Contents

3. Data Description

Just like any other competitive sport, DotA 2 is blessed with huge chunks of stats. For this particular analysis, the researchers focused on premium matches starting from 27 August 2018 up to 16 June 2019.

Why so? Well, similar to an NBA season, DotA 2 also has a so called DotA Pro Circuit or DPC in short. The DPC is composed of ten (10) tournaments in which the winners will earn qualifying points for the most prestigious DotA 2 tournament held every year called The International or commonly referred to as TI. For analogy, TI is like the NBA Playoffs. Thus, for the purposes of relevance to the current metagame, the researchers collected match data appropriate to one (1) DPC which spans for one (1) whole year.

For more info about the DPC, you can visit https://www.dota2.com/procircuit.

*NOTE: `premium` matches are DPC games which are in essence, professional DotA 2 matches.

3.1 Data Source

Premium matches are identified by Match IDs which were obtained using OpenDota's data explorer (see Figure 3). The data explorer is a tool provided by OpenDota to run advanced SQL queries on their database of professional matches. Retrieved Match IDs were then stored to a .csv file named premium match ids.

Figure 3. OpenDota Data Explorer

After retrieving the Match IDs, the researchers obtained each Match ID's match data using the OpenDota API. which accepts the Match ID as an input and returns a .json file. Each API response was filtered out to include relevant information only. More on this in Section 3.3.

3.2 Data Structure

The dataframe is composed of 15,260 rows and 31 columns which contains match data of 1,526 premium matches and is stored in a database named dmwfinalproject. Each match data consists of ten rows, one row for each unique hero and 31 columns (or features). Details on how the data was scraped using OpenDota's API will be discussed in Section 4.
In [3]:
# Connecting to the database
conn = sqlite3.connect('dmwfinalproject.db')
c = conn.cursor()

# Select the dataframe from the database
df_ds = pd.read_sql('''SELECT * FROM dota2''', conn)

# Set hero_id as index for visualization
df_ds.set_index('hero_id').head()
Out[3]:
ancient_kills assists camps_stacked courier_kills creeps_stacked deaths denies gold_per_min gold_spent hero_damage ... player_slot roshan_kills rune_pickups sen_placed sentry_kills sentry_uses stuns tower_damage tower_kills xp_per_min
hero_id
brewmaster 0.0 6 0.0 0.0 0.0 7 11 414 9460 11937 ... 0 0.0 3.0 0.0 0.0 0.0 41.122314 573 0.0 511
treant 0.0 10 0.0 0.0 0.0 7 1 263 7325 5514 ... 1 0.0 4.0 1.0 0.0 1.0 59.418800 230 1.0 301
necrolyte 0.0 2 0.0 0.0 0.0 8 11 419 10135 11852 ... 2 0.0 3.0 0.0 0.0 0.0 13.081226 1201 1.0 376
furion 0.0 9 0.0 2.0 0.0 7 0 285 7255 11574 ... 3 0.0 3.0 5.0 1.0 4.0 13.773438 2586 1.0 319
mirana 0.0 8 0.0 0.0 0.0 7 10 385 10945 13182 ... 4 0.0 2.0 0.0 0.0 0.0 9.611982 926 0.0 371

5 rows × 30 columns

In [4]:
print("The dataframe's shape is:", df_ds.shape)
The dataframe's shape is: (15260, 31)

3.3 Data Dictionary

The following is the data dictionary of the relevant information gathered from the OpenDota API. "Relevant information" was filtered based on metrics that affect the game impact of a hero. Game impact is defined by the researchers as any hero statistic that contribute to the objectives of the game. Some hero statistics include hero damage dealt, tower damage done, and etc.

These metrics were carefully chosen by the researchers based on their domain knowledge. Each metric serves as the column or feature described in Table 1. To see the complete list of the API response of OpenDota, click here.


Table 1. Data Dictionary

Features Description
Categorical
hero_id The ID value of the hero played
match_id The ID number of the match assigned by Valve
player_slot Which slot the player is in. 0-127 are Radiant, 128-255 are Dire
Numerical
ancient_kills Total number of Ancient creeps killed by the player
assists Number of assists the player had
camps_stacked Number of camps stacked
courier_kills Total number of courier kills the player had
creeps_stacked Number of creeps stacked
deaths Number of deaths
denies Number of denies
gold_per_min Gold Per Minute obtained by this player
gold_spent How much gold the player spent
hero_damage Hero Damage Dealt
hero_healing Hero Healing Done
kda Kill-Death-Assist ratio
kills Number of kills
last_hits Number of last hits
level Level at the end of the game
neutral_kills Total number of neutral creeps killed
obs_placed Total number of observer wards placed
observer_kills Total number of observer wards killed by the player
observer_uses Number of observer wards used
roshan_kills Total number of roshan kills (last hit on roshan) the player had
rune_pickups Number of runes picked up
sen_placed How many sentries were placed by the player
sentry_kills Total number of sentry wards killed by the player
sentry_uses Number of sentry wards used
stuns Total stun duration of all stuns by the player
tower_damage Total tower damage done by the player
tower_kills Total number of tower kills the player had
xp_per_min Experience Per Minute obtained by the player

Back to Table of Contents

4. Data Processing

In this section, data is collected and translated into usable information. Data processing starts with scraping the data in its raw form from https://docs.opendota.com and converts it into a more readable format, giving it the form and context necessary to be interpreted by computers and utilized by the researchers.

4.1 Data Scraping

As mentioned in Section 3.2, the dataframe is composed of 15,260 rows with 31 columns which was scraped using the OpenDota API and the requests library. The API response was then parsed using the json library.

The actual code for scraping is hidden for privacy purposes, contact the researchers to see scraper.

4.2 Raw Data Storage

For replication of the study and easier reanalysis, the data was stored in a database named dmwfinalproject.

4.3 Feature Selection and Data Cleaning

Since not all the features selected from the API are needed in the study, it is important to filter only those features that greatly affect the heroes' game impact. Through the domain knowledge of the researchers, 17 features were dropped as some are redundant characteristics or irrelevant of the game impact. This resulted to 14 features which will be the working data for the study.

The features selected are the following:

  • Ancient kills
  • Camps stacked
  • Denies
  • Gold per minute
  • Hero damage
  • Hero healing
  • KDA
  • Neutral kills
  • Observers placed
  • Rune pickups
  • Sentries placed
  • Stuns
  • Tower damage
  • XP per minute

For the data cleaning part, the output of OpenDota's API is already relatively clean. However, null/none values were encoded when the hero does not have values for a specific feature. Thus, to simplify the data and in preparation for the analysis, null/none values were replaced by zero (0).

4.4 Polished Data

After feature selection and data cleaning , here is how the working data looks like.

In [12]:
df_dropped.head()
Out[12]:
ancient_kills camps_stacked denies gold_per_min hero_damage hero_healing kda neutral_kills obs_placed rune_pickups sen_placed stuns tower_damage xp_per_min hero_id
0 0.0 0.0 11 414 11937 0 1 27.0 0.0 3.0 0.0 41.122314 573 511 brewmaster
1 0.0 0.0 1 263 5514 4793 1 5.0 0.0 4.0 1.0 59.418800 230 301 treant
2 0.0 0.0 11 419 11852 1709 0 25.0 1.0 3.0 0.0 13.081226 1201 376 necrolyte
3 0.0 0.0 0 285 11574 150 1 5.0 14.0 3.0 5.0 13.773438 2586 319 furion
4 0.0 0.0 10 385 13182 0 1 53.0 0.0 2.0 0.0 9.611982 926 371 mirana

Shown below are the descriptive statistics of each feature.

In [13]:
df_descr_stat
Out[13]:
count mean std min 25% 50% 75% max
ancient_kills 15260.0 6.442857 12.574025 0.000000 0.00 0.000000 6.000000 117.0000
camps_stacked 15260.0 1.175557 1.565125 0.000000 0.00 1.000000 2.000000 22.0000
denies 15260.0 9.073657 8.584859 0.000000 3.00 6.000000 13.000000 76.0000
gold_per_min 15260.0 424.965990 159.568007 105.000000 298.00 404.000000 533.000000 1230.0000
hero_damage 15260.0 15951.867890 11072.757463 597.000000 8426.25 12920.000000 20274.250000 137388.0000
hero_healing 15260.0 1012.285714 2464.154335 0.000000 0.00 0.000000 744.250000 43270.0000
kda 15260.0 3.681979 4.479937 0.000000 1.00 2.000000 5.000000 41.0000
neutral_kills 15260.0 51.445151 57.566622 0.000000 8.00 31.000000 76.000000 515.0000
obs_placed 15260.0 3.328899 5.742847 0.000000 0.00 0.000000 3.000000 50.0000
rune_pickups 15260.0 4.641350 3.468111 0.000000 2.00 4.000000 6.000000 47.0000
sen_placed 15260.0 4.459502 9.048522 0.000000 0.00 0.000000 3.000000 107.0000
stuns 15260.0 29.037792 33.243983 -6.125977 0.00 19.697351 44.008425 374.2066
tower_damage 15260.0 2177.592529 3392.530663 0.000000 97.00 670.000000 2736.000000 28916.0000
xp_per_min 15260.0 468.283617 149.466231 63.000000 357.00 461.000000 575.000000 1052.0000
As seen above, some features (hero_damage, tower_damage, hero_healing) have a very high standard deviation meaning there are higher chances of having outliers in the dataset. This might be because both the win and lose games are included, and their values are quite different from each other. Also, looking at each feature's mean values, we can see that the scales of each features differ and this might incur a big impact when not handled prior to the clustering.

Back to Table of Contents

5. Exploratory Data Analysis

Before proceeding with the clustering proper, it is important to take a glance at what the data could provide us so that we would know something to expect in the clustering. Below is a horizontal bar plot of the top 10 frequently picked heroes for the recent DPC.

In [14]:
print(hero_count[:10])
[('tiny', 396), ('rubick', 306), ('earth_spirit', 297), ('oracle', 285), ('earthshaker', 284), ('doom_bringer', 282), ('terrorblade', 279), ('brewmaster', 278), ('furion', 265), ('winter_wyvern', 265)]
As seen from the plot, there is variability with the number of times a hero appeared in matches in the recent DPC. This is okay since clustering will be done per hero observation so we expect some heroes to appear in multiple clusters (A hero can offer varying game impact based on how it is played during the game).

It is also important to look at the distribution of each feature visually to see the type of distribution each feature has. As seen on the figure below, most of the features have right-tailed distributions which means that there are values which are significantly higher than the others. It could also be noted that the scales of each feature vary.

In [15]:
# Plotting distributions of each feature
For the relationship of each feature with each other, the researchers used a pairplot to see which features go on the same direction (i.e. correlated with each other positively), opposite direction (correlated with each other negatively) or no relationship at all. Correlated features can be dropped depending on the researchers' decision.
In [16]:
# Pairplot of each feature
From the figure above, it can be seen that obs_placed and sen_placed are positively correlated. However, the researchers' made the final call not to drop one of these features as these features are entirely different items in-game.

Back to Table of Contents

6. Methodology

The initial part of the analysis starts with clustering which groups together datapoints or in this case, heroes with similar game impact based on the features the researchers identified.

It is important to cluster the heroes prior to recommending alternative heroes to the banned hero to minimize the deviation from the original game strategy. To elaborate, this means that if a team wants a hero to be played for a specific role (which delivers a certain game impact) and it gets banned, the team will first look into the group of heroes with that role and try to find alternative heroes which are most similar to the banned hero in terms of game impact.

Failing to do this would just result to identifying alternatives, regardless of the game impact that the team would like that hero to provide. (e.g. Playing a core windranger delivers a different game impact than a support windranger. If the team queries without the clusters, it would be impossible for them to identify alternatives for a core or support windranger).

Below is the step-by-step procedure on how to arrive at the clusters and how alternative heroes will be identified:
6.1 Normalizing the Data
6.2 Clustering using KMeans

  • 6.2.1 Determining the Optimal Number of Clusters
  • 6.2.2 Clustering Heroes and Naming the Clusters Formed
    6.3 Determining the Most Similar Heroes for a Banned Hero

6.1 Normalizing the data

As what was stated in Sections 4.4 and 5, the scales of values of each feature are highly different from each other. This could have a great impact on the result of the clustering. That is why the researchers opted to normalize the data for all features using the StandardScaler method of normalization.

Normalization changes only the values of the dataset by forcing the mean to be equal to zero (0) and standard deviation to be equal to one (1), however, it does not change the distribution of the original dataset. To validate this claim, shown below are the plots of the data before and after normalization.
In [17]:
df_dropped_normed.head()
Out[17]:
ancient_kills camps_stacked denies gold_per_min hero_damage hero_healing kda neutral_kills obs_placed rune_pickups sen_placed stuns tower_damage xp_per_min
0 -0.512411 -0.751119 0.224396 -0.068725 -0.362602 -0.410818 -0.598684 -0.424655 -0.579679 -0.473285 -0.492859 0.363522 -0.472994 0.285802
1 -0.512411 -0.751119 -0.940484 -1.015061 -0.942693 1.534335 -0.598684 -0.806833 -0.579679 -0.184934 -0.382340 0.913910 -0.574101 -1.119243
2 -0.512411 -0.751119 0.224396 -0.037390 -0.370278 0.282749 -0.821909 -0.459398 -0.405544 -0.473285 -0.492859 -0.479999 -0.287875 -0.617441
3 -0.512411 -0.751119 -1.056972 -0.877184 -0.395386 -0.349943 -0.598684 -0.806833 1.858216 -0.473285 0.059735 -0.459176 0.120388 -0.998811
4 -0.512411 -0.751119 0.107908 -0.250472 -0.250160 -0.410818 -0.598684 0.027010 -0.579679 -0.761636 -0.492859 -0.584360 -0.368938 -0.650895
In [18]:
# Plotting distributions of each feature
In [19]:
# Plotting normalized distributions of each feature

As seen from the two distribution plots above, the distribution (shape) remained but the mean and standard deviations were set to zero (0) and one (1) respectively.

6.2 Clustering using K-Means

To cluster the heroes based on their game impact, K-Means clustering algorithm, known as the simplest and most popular representative-based clustering method was used. This method is useful in clustering data with few features; with mostly continuous variables, and when an available ground truth can be used as the initial number of clusters based on the researchers' domain knowledge. The mean of the points in the cluster is chosen to be the representative for that cluster. In order to determine the optimal number of clusters to use, internal validation criteria were used and the themes extracted from the optimal number of clusters formed were analyzed as a form of sanity check.

6.2.1 Determining the Optimal Number of Clusters (choose best k)

To determine the optimal number of clusters that best fits the data, internal validation criteria were used. Below are the internal validations used to evaluate the clusters formed using K-Means.

  • Sum of Squares Distance to Centroids (SSE) - corresponds to the sum of squares distances to the representative points. The smaller the values, the better is the clustering.
  • Calinski-Harabasz index (CH) - ratio of the average between-clusters dispersion and within-cluster dispersion. Higher values of this metric means better clustering.
  • Silhouette coefficient (SC) - its value ranges from -1 to 1 which means that if the value is largely positive, the clusters are highly separated while if negative, there are some mixing of data points among clusters.

Below are the functions that we used in performing the internal validation criteria.

  • cluster_range - computes the cluster range from the given design matrix. It outputs a dictionary of cluster labels and internal validation values
  • plot_internal - plot of the results of the internal validation criteria
In [25]:
# Plotting internal validation criteria for various k
Based on the plots above, it can be seen that for SSE, the elbow occured when k = 3. The "elbow" is defined as when the sum of squares error is minimized. For the CH, it can be seen that the trend is decreasing so there would be no conclusive value. For the Silhouette, it is ideal to have a high value. However, it is also decreasing similar to the CH. Thus, no conclusive value can be derived. With this, the researchers identified the number of optimal clusters only by the plot of SSE which is equivalent to three (3).

For the clustering proper, k = 3 will be the basis since K-Means requires a parameter for the number of clusters before performing the clustering.

Back to Table of Contents

7. Clustering Proper

The section below discusses the clustering of heroes when k = 3. To determine the cluster of the actual heroes data, the researchers appended the cluster number that was produced in the dataframe to help in separating the groups that was formed. With each hero observation properly clustered, the researchers can then obtain some statistics needed to describe the clusters formed. After that, the most important features per cluster can be identified which will serve as the basis for extracting the theme observed in that cluster.

After clustering each hero observation based on its game impact, the researchers aggregated each hero observation per cluster by calculating the mean values of the features for each unique hero. Additionally, the most important features for each cluster were identified using the k_means.cluster_centers_ function.

The most important features for each cluster are the following. For Cluster 0 -- stuns, hero_healing, camps_stacked; For Cluster 1 -- obs_placed, sen_placed, hero_healing; For Cluster 2 -- gold_per_min, neutral_kills, xp_per_min.

Below is the complete list of features per cluster ordered by decreasing importance.

In [31]:
# Complete list of features per cluster ordered by decreasing importance
Cluster 0 most important features: 
 ['stuns', 'hero_healing', 'camps_stacked', 'rune_pickups', 'denies', 'kda', 'xp_per_min', 'hero_damage', 'gold_per_min', 'neutral_kills', 'ancient_kills', 'tower_damage', 'sen_placed', 'obs_placed']

Cluster 1 most important features: 
 ['obs_placed', 'sen_placed', 'hero_healing', 'camps_stacked', 'stuns', 'rune_pickups', 'kda', 'ancient_kills', 'tower_damage', 'hero_damage', 'denies', 'neutral_kills', 'xp_per_min', 'gold_per_min']

Cluster 2 most important features: 
 ['gold_per_min', 'neutral_kills', 'xp_per_min', 'tower_damage', 'ancient_kills', 'hero_damage', 'denies', 'kda', 'rune_pickups', 'camps_stacked', 'hero_healing', 'stuns', 'sen_placed', 'obs_placed']

Looking at the list of features above and the distributions of each cluster plotted per feature below, the researchers can then define the themes of each cluster formed. The three (3) clusters were named as Utility (Cluster 0), Support (Cluster 1) and Core (Cluster 2).

Based on the plots below, majority of the distribution plots are very telling of each cluster's characteristics:

  • Utility heroes are almost always in between cores and supports (see gold_per_min, xp_per_min, hero_damage)
  • Supports are opposite of cores, and have high values in obs_placed and sen_placed
  • Cores tend to have higher denies, kda, xp_per_min, hero_damage, gold_per_min, neutral_kills and tower_damage values
In [32]:
# Plotting distributions of features for each cluster

To further visualize the most important features of each cluster, the researchers plotted a radar plot with feature weights as values seen below. The radar plot is consistent with the list of features and distribution plots discussed above.

In [38]:
# Making radar plots

The Utility cluster contains a total of 117 heroes. The Support cluster contains a total of 60 heroes. While the Core cluster contains a total of 91 heroes.

A sample of the actual heroes included in each cluster is given below (limited to five (5) heroes only).

In [141]:
# Five random utility heroes
Utility:
['night_stalker' 'magnataur' 'dazzle' 'slardar' 'dark_willow']
In [65]:
# Five random support heroes
Support:
['grimstroke' 'tusk' 'skywrath_mage' 'enigma' 'oracle']
In [72]:
# Five random core heroes
Core:
['slark' 'troll_warlord' 'venomancer' 'viper' 'dragon_knight']

7.1 Determining the Most Similar Heroes for a Sampled Hero

Now that we have clustered our heroes, we can do the second part of the analysis which is to recommend alternative picks for the said hero. By calculating the Euclidean distance between heroes on the same cluster/feature space, we can select the k most similar heroes to the banned hero.

For example, a team wanted to pick a support Crystal Maiden. However, the opposing team banned it. We then determine the most similar heroes with a sample query shown below.

In [41]:
# Alternatives to a support Crystal Maiden
Querying for an alternative to a support Crystal Maiden:
Out[41]:
['elder_titan', 'ancient_apparition', 'lich', 'rubick']

As another example, suppose a team wanted to play Abaddon as a support --

In [42]:
# Alternatives to a support Abaddon
Querying for an alternative to a support Abaddon:
Out[42]:
['dazzle', 'oracle', 'treant', 'chen']

To show the purpose of clustering, suppose that the team wanted to play abaddon as a core --

In [43]:
# Alternatives to a core Abaddon
Querying for an alternative to a core Abaddon:
Out[43]:
['tidehunter', 'viper', 'nevermore', 'dark_seer']

The resulting similar heroes are fair alternatives for the queried heroes during actual DotA 2 games.

Back to Table of Contents

8. Conclusions

The researchers were able to identify three (3) clusters based on a hero's game impact. The formed clusters and its general characteristics are the following:

  • Core: Heroes that are designed to be the primary damage dealers throughout the game and are usually allocated the highest amount of resources. Evident in its top defining features of gold_per_min, neutral_kills and xp_per_min
  • Utility: Heroes that are versatile and can adapt to the pace of the game. A mixture of core and suppport. Evident in the distribution plot of features showing that utility heroes are between cores and support.
  • Support: Heroes that are primarily played to support cores by providing healing and purchasing items that benefit the whole team. Evident in its top defining features of obs_placed, sen_placed and hero_healing.

Additionally, the researchers were also successful in providing a system that can determine an alternative to a banned hero. Based on the researchers' domain knowledge, the resulting similar heroes are fair alternatives to a queried hero during actual DotA 2 games.

For DotA 2 players, the Core cluster can be defined as Position 1 and 2; Utility as Position 3 and 4; and the Supports as Position 5.

For professional teams, apart from using this system to provide quick alternatives to a banned hero, it can also be used for theorycrafting alternative heroes to widen the pocket strategies of the team.

9. Recommendations

The researchers recommend the following to further optimize the system:

  • Consider won games only by separating win and lose games. The researchers believe that the maximum potential of a certain hero played at a certain role can only be captured in games that are won.
  • Include game duration in normalizing/aggregating hero observation. The duration of the game can skew the stats of a hero.
  • Limit the dataset to a per patch basis to eliminate the changes in a certain hero's skillset and offerings.
  • Constantly update the database every patch to maintain relevance to the current metagame.

10. References / Acknowledgements

To complete the study, the researchers used the following resources as reference:

[1] Modes, G. (2019). Game modes. Retrieved 3 June 2019, from https://dota2.gamepedia.com/Game_modes#Captains_Mode
[2] Dota 2 Statistics. (n.d.). Retrieved 3 June 2019, from https://www.opendota.com/explorer
[3] DotA 2 Guide (n.d.). Retrieved 3 June 2019, from https://purgegamers.true.io/g/dota-2-guide/
[4] Use Faceting for Radar Chart (n.d.). Retrieved 23 July 2019, from https://python-graph-gallery.com/392-use-faceting-for-radar-chart/

In addition to the references used in the study, the researchers would like to acknowledge Prof. Christian Alis, PhD, Prof. Erika Legara, PhD and Prof. Eduardo David, Jr. for mentoring us throughout the course and imparting their knowledge in our journey to become a Data Scientist.

11. Contact Us


Justine Buno: jbuno@aim.edu
Aries de Guzman: adeguzman@aim.edu
Ray Franco Rivera: rrivera@aim.edu
Paul Michael Uy: puy@aim.edu

Back to Table of Contents